Approximate String Matching with Lempel-Ziv Compressed Indexes
نویسندگان
چکیده
A compressed full-text self-index for a text T is a data structure requiring reduced space and able of searching for patterns P in T . Furthermore, the structure can reproduce any substring of T , thus it actually replaces T . Despite the explosion of interest on self-indexes in recent years, there has not been much progress on search functionalities beyond the basic exact search. In this paper we focus on indexed approximate string matching (ASM), which is of great interest, say, in computational biology applications. We present an ASM algorithm that works on top of a Lempel-Ziv self-index. We consider the so-called hybrid indexes, which are the best in practice for this problem. We show that a Lempel-Ziv index can be seen as an extension of the classical q-samples index. We give new insights on this type of index, which can be of independent interest, and then apply them to the Lempel-Ziv index. We show experimentally that our algorithm has a competitive performance and provides a useful space-time tradeoff compared to classical indexes.
منابع مشابه
A General Practical Approach to Pattern Matching over Ziv-Lempel Compressed Text
We address in this paper the problem of string matching on Lempel-Ziv compressed text. The goal is to search a pattern in a text without uncompressing. This is a highly relevant issue, since it is essential to have compressed text databases where eecient searching is still possible. We develop a general technique for string matching when the text comes as a sequence of blocks. This abstracts th...
متن کاملApproximate String Matching with Compressed Indexes
A compressed full-text self-index for a text T is a data structure requiring reduced space and able to search for patterns P in T . It can also reproduce any substring of T , thus actually replacing T . Despite the recent explosion of interest on compressed indexes, there has not been much progress on functionalities beyond the basic exact search. In this paper we focus on indexed approximate s...
متن کاملApproximate String Matching over Ziv - LempelCompressed
We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) ti...
متن کاملApproximate String Matching over Ziv
We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, speciically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to k insertions, deletions and substitutions, in O(mkn + R) ti...
متن کامل